Vectorised Spreading Activation Algorithm for Centrality Measurement
نویسندگان
چکیده
TROUSSOV, A., DAŘENA, F., ŽIŽKA, J., PARRA, D., BRUSILOVSKY, P.: Vectorised Spreading Activation algorithm for centrality measurement. Acta univ. agric. et silvic. Mendel. Brun., 2011, LIX, No. 7, pp. 469– 476 Spreading Activation is a family of graph-based algorithms widely used in areas such as information retrieval, epidemic models, and recommender systems. In this paper we introduce a novel Spreading Activation (SA) method that we call Vectorised Spreading Activation (VSA). VSA algorithms, like “traditional” SA algorithms, iteratively propagate the activation from the initially activated set of nodes to the other nodes in a network through outward links. The level of the node’s activation could be used as a centrality measurement in accordance with dynamic model-based view of centrality that focuses on the outcomes for nodes in a network where something is fl owing from node to node across the edges. Representing the activation by vectors allows the use of the information about various dimensionalities of the fl ow and the dynamic of the fl ow. In this capacity, VSA algorithms can model multitude of complex multidimensional network fl ows. We present the results of numerical simulations on small synthetic social networks and multi dimensional network models of folksonomies which show that the results of VSA propagation are more sensitive to the positions of the initial seed and to the community structure of the network than the results produced by traditional SA algorithms. We tentatively conclude that the VSA methods could be instrumental to develop scalable and computationally effi cient algorithms which could achieve synergy between computation of centrality indexes with detection of community structures in networks. Based on our preliminary results and on improvements made over previous studies, we foresee advances and applications in the current state of the art of this family of algorithms and their applications to centrality measurement. centrality, network fl ow, spreading activation, graph-based methods, recommender systems, data mining Spreading Activation is a family of graph-based algorithms widely used in areas such as information retrieval, epidemic models, and recommender systems (Crestani, 1997; Rocha, Schwabe, Poggi de Aragao, 2004; Troussov et al., 2009; Dařena, Troussov, Žižka, 2010). In this paper we introduce a novel Spreading Activation (SA) method that we call Vectorised Spreading Activation (VSA). VSA algorithms, like “traditional” SA algorithms, iteratively propagate the activation from the initial set of nodes referred to as the seed, to the other nodes in a network through outward links. The level of the node’s activation could be used as a centrality measurement in accordance with dynamic model-based view of centrality in (Borgatti, 2005) “that focuses on the outcomes for nodes in a network where something is fl owing from node to node across the edges” (Borgatti, Everett, 2006). Representing the activation by vectors allows us to use of the information about various dimensionalities of the fl ow and the dynamic of the fl ow. In this capacity, VSA algorithms can model multitude of complex multidimensional network fl ows. In this paper we don’t discuss interpretations of the nature of “what is fl owing”, and we treat the network fl ow as the process of the propagation of an abstract relevancy measure also called the activation. 470 A. Troussov, F. Dařena, J. Žižka, D. Parra, P. Brusilovsky The activation is a function on the nodes of the network and spreading activation is an example of the discrete fl ow process where the activation of a node on the next iteration is explicitly specifi ed as function of the activations at the node and its neighbours. Usually the activation is real-valued or Boolean-valued function. However, a real-valued function on the nodes of the network is not able to describe multiple dimensionalities of the activation, and below we are going to present examples where considering multidimensional fl ow processes might be useful. Therefore in this paper we assume that the value of the relevancy measure could be represented by vectors of real-valued, Boolean-valued or even other scalar or list (compound) components. We don’t assume that vector algebra operations are necessarily meaningful for multidimensional fl ows; therefore in this paper the term “vector” is used not in a strict mathematical sense, but in a sense frequently assumed in computer sciences: a vector or a “tuple” is a list of components, such as a set of value attributes in relational databases. The focus of this paper is on “how it is fl owing” more than on “what is fl owing”. Traditional SA algorithms model diff usion-like processes, where on each iteration the future depends only upon the present state – the distribution of relevancy function on the network, and do not depend on the history of how this distribution was achieved. We can provide examples of physical processes, which do not fi t into this scheme. For instance, if we consider the system of materials points which oscillate around fi xed positions on a regular grid acting by forces of the interaction with neighbour points then knowing positions of the points at a current iteration is not enough to compute the positions on the next iteration. These positions depend not only on the positions of materials points and the interaction between them, but also on the velocity of material points (and approximation of velocities requires knowing at least two previous states of a material point, not one like in diff usion-like processes). Another common property of diff usion-like network fl ows is that they are usually inherently “linear” as mathematical operators that map the input to the output function on network nodes. For instance, the resulting distribution of the activation initiated in two network nodes is the same as the sum of the resulting distribution of two independent processes initiated in each node from the pair. More formal description of the linearity requires introduction of a black box description of network fl ow. Black box description of any network fl ow is the operator Ht which maps the input function on network nodes x(v) to the output function y(v), where t is the continuous or discrete parameter representing time, v – network nodes, x() and y() are real-valued functions (or complexvalued, vector valued, etc. functions). Most of the diff usion-like network fl ows Ht are linear, i.e. given two initial functions x1(v) and x2(v), the result satisfy the properties of superposition and scaling: Ht(x1 + x2) = Ht(x1) + Ht(x2) for any scalar values and . As we outlined above, diff usion-like network fl ow methods share certain fundamental properties. VSA algorithms could model multitude of complex network fl ow processes, and in such capacity could be useful to overcome limitations of currently used network fl ow methods. The rest of the paper is organized as follows. In section Vectorised Spreading Activation Algorithms we describe the VSA algorithm. In section Applications to User Similarity in Folksonomies we render a formal model of folksonomies as a multidimensional network with four types of nodes corresponding to users, resources, tags and instances of tagging and then we present the results of numerical simulations using VSA. In this section we also present the results of ranking using VSA on small synthetic models of social networks. Section Clustering Behaviour of VSA Results at Diff erent Levels of Analysis demonstrates the behaviour of VSA that is sensitive to clustering structure of networks and section Applications to Centrality Measurement in Large Scale Social Networks shows how the VSA algorithm can be used for calculating various centrality measures in a new way. Finally, the last section describes the conclusions and future work. Vectorised Spreading Activation Algorithms Spreading activation algorithms iteratively propagate the activation from the initial set of nodes referred to as the seed, to the other nodes in a network through outward links (Troussov et al., 2009). Usually, this propagation is done until the behaviour of the system stabilizes near the so called the limit distribution or the algorithm is stopped by constraints such as the limitation on the total number of iterations. Representing the activation by vectors (or, more generally, by ordered sequences of non-homogeneous values where components might belong to diff erent universes – numbers, binary and Boolean data types, etc.), allows to store at each node the information about various dimensionalities of the activation as well as the information about dynamics of the process of propagation. We introduce two mechanisms to exploit vectorvalued activation to modify the behaviour of the propagation. The idea behind the fi rst mechanism is to provide the nodes with a kind of “inertia” in terms of changing their activation values according to the progress of previous iterations. In addition, this “inertia” is used to speed up the convergence of the algorithm (i.e. achieving the limit distribution). For instance, if on each iteration the activation at particular node decreases, this mechanism makes this decrease faster. From the technical point of view, the primary goal of the second mechanism – process dependent constraint on the number of iterations – is to speed Vectorised Spreading Activation algorithm for centrality measurement 471 up the convergence of the algorithm without limiting the spread only in the vicinity of the initial seed. Traditional spreading activation algorithms frequently employ constraint on the total number of iterations, so that the process of redistribution of the activation stops independently on the topology of the network and the distribution of the activation achieved (Dařena, Troussov, Žižka, 2010). The new mechanism we propose limits the number of input/ output operations for nodes; some nodes (especially those located near the initial seeds), might be removed from the process of redistribution of the activation, while some other nodes (located further from the seed) might continue to participate in the redistribution of the activation. From an application point of view, this mechanism aims to reduce the infl uence of globally important nodes (hubs) on the activation redistribution on micro level. Algorithms which propagate real-valued activation function F usually have the following steps (Troussov et al., 2009; Troussov, Parra, Brusilovsky, 2009): • Initialization – sets the parameters of the algorithm, network, and the initial seed of nodes with non-zero F values.
منابع مشابه
Link transmission centrality in large-scale social networks
Abstract Understanding the importance of links in transmitting information in a network can provide ways to hinder or postpone ongoing dynamical phenomena like the spreading of epidemic or the diffusion of information. In this work, we propose a new measure based on stochastic diffusion processes, the transmission centrality, that captures the importance of links by estimating the average numbe...
متن کاملInformation cascades in complex networks
Information cascades are important dynamical processes in complex networks. An information cascade can describe the spreading dynamics of rumour, disease, memes, or marketing campaigns, which initially start from a node or a set of nodes in the network. If conditions are right, information cascades rapidly encompass large parts of the network, thus leading to epidemics or epidemic spreading. Ce...
متن کاملIdentifying effective multiple spreaders by coloring complex networks
How to identify influential nodes in social networks is of theoretical significance, which relates to how to prevent epidemic spreading or cascading failure, how to accelerate information diffusion, and so on. In this Letter, we make an attempt to find effective multiple spreaders in complex networks by generalizing the idea of the coloring problem in graph theory to complex networks. In our me...
متن کاملIdentification of highly susceptible individuals in complex networks
Identifying highly susceptible individuals in spreading processes is of great significance in controlling outbreaks. In this paper, we explore the susceptibility of people in susceptible-infectious-recovered (SIR) and rumor spreading dynamics. We first study the impact of community structure on people’s susceptibility. Despite that the community structure can reduce the infected population give...
متن کاملA modified weighted TOPSIS to identify influential nodes in complex networks
Identifying influential nodes in complex networks is still an open issue. Although various centralitymeasures have been proposed to address this problem, such as degree, betweenness, and closeness centralities, they all have some limitations. Recently, technique for order performance by similarity to ideal solution (TOPSIS), as a tradeoff between the existing metrics, has beenproposed to rankno...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011